AITopics | tone classification

Collaborating Authors

tone classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CAT-Net: A Cross-Attention Tone Network for Cross-Subject EEG-EMG Fusion Tone Decoding

Zhuang, Yifan, Huang, Calvin, Yu, Zepeng, Zou, Yongjie, Ju, Jiawei

arXiv.org Artificial IntelligenceNov-17-2025

Brain-computer interface (BCI) speech decoding has emerged as a promising tool for assisting individuals with speech impairments. In this context, the integration of electroencephalography (EEG) and electromyography (EMG) signals offers strong potential for enhancing decoding performance. Mandarin tone classification presents particular challenges, as tonal variations convey distinct meanings even when phonemes remain identical. In this study, we propose a novel cross-subject multimodal BCI decoding framework that fuses EEG and EMG signals to classify four Mandarin tones under both audible and silent speech conditions. Inspired by the cooperative mechanisms of neural and muscular systems in speech production, our neural decoding architecture combines spatial-temporal feature extraction branches with a cross-attention fusion mechanism, enabling informative interaction between modalities. We further incorporate domain-adversarial training to improve cross-subject generalization. We collected 4,800 EEG trials and 4,800 EMG trials from 10 participants using only twenty EEG and five EMG channels, demonstrating the feasibility of minimal-channel decoding. Despite employing lightweight modules, our model outperforms state-of-the-art baselines across all conditions, achieving average classification accuracies of 87.83% for audible speech and 88.08% for silent speech. In cross-subject evaluations, it still maintains strong performance with accuracies of 83.27% and 85.10% for audible and silent speech, respectively. We further conduct ablation studies to validate the effectiveness of each component. Our findings suggest that tone-level decoding with minimal EEG-EMG channels is feasible and potentially generalizable across subjects, contributing to the development of practical BCI applications.

machine learning, natural language, speech condition, (19 more...)

arXiv.org Artificial Intelligence

2511.10935

Country: Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.88)

Add feedback

Colors Matter: AI-Driven Exploration of Human Feature Colors

Alyoubi, Rama, Alharbi, Taif, Alghamdi, Albatul, Alshehri, Yara, Alghamdi, Elham

arXiv.org Artificial IntelligenceMay-22-2025

This study presents a robust framework that leverages advanced imaging techniques and machine learning for feature extraction and classification of key human attributes-namely skin tone, hair color, iris color, and vein-based undertones. The system employs a multi-stage pipeline involving face detection, region segmentation, and dominant color extraction to isolate and analyze these features. Techniques such as X-means clustering, alongside perceptually uniform distance metrics like Delta E (CIEDE2000), are applied within both LAB and HSV color spaces to enhance the accuracy of color differentiation. For classification, the dominant tones of the skin, hair, and iris are extracted and matched to a custom tone scale, while vein analysis from wrist images enables undertone classification into "Warm" or "Cool" based on LAB differences. Each module uses targeted segmentation and color space transformations to ensure perceptual precision. The system achieves up to 80% accuracy in tone classification using the Delta E-HSV method with Gaussian blur, demonstrating reliable performance across varied lighting and image conditions. This work highlights the potential of AI-powered color analysis and feature extraction for delivering inclusive, precise, and nuanced classification, supporting applications in beauty technology, digital personalization, and visual analytics.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.14931

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
(2 more...)

Add feedback

Normalization through Fine-tuning: Understanding Wav2vec 2.0 Embeddings for Phonetic Analysis

Wang, Yiming, Yang, Yi, Yuan, Jiahong

arXiv.org Artificial IntelligenceMar-4-2025

Phonetic normalization plays a crucial role in speech recognition and analysis, ensuring the comparability of features derived from raw audio data. However, in the current paradigm of fine-tuning pre-trained large transformer models, phonetic normalization is not deemed a necessary step; instead, it is implicitly executed within the models. This study investigates the normalization process within transformer models, especially wav2vec 2.0. Through a comprehensive analysis of embeddings from models fine-tuned for various tasks, our results demonstrate that fine-tuning wav2vec 2.0 effectively achieves phonetic normalization by selectively suppressing task-irrelevant information. We found that models fine-tuned for multiple tasks retain information for both tasks without compromising performance, and that suppressing task-irrelevant information is not necessary for effective classification. These findings provide new insights into how phonetic normalization can be flexibly achieved in speech models and how it is realized in human speech perception.

correlation, information, normalization, (16 more...)

arXiv.org Artificial Intelligence

2503.04814

Country:

Asia > China > Anhui Province > Hefei (0.05)
Europe > Czechia (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Add feedback

Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis

Ahbabi, Hamdan Al, Marti, Gautier, AlMarri, Saeed, Elfadel, Ibrahim

arXiv.org Artificial IntelligenceFeb-26-2025

--Self-supervised learning models for speech processing, such as wav2vec2, HuBERT, WavLM, and Whisper, generate embeddings that capture both linguistic and paralinguistic information, making it challenging to analyze tone independently of spoken content. In this work, we introduce a method for disentangling paralinguistic features from linguistic content by regressing speech embeddings onto their corresponding text embeddings and using the residuals as a representation of vocal tone. We evaluate this approach across multiple self-supervised speech embeddings, demonstrating that residual embeddings significantly improve tone classification performance compared to raw speech embeddings. Our results show that this method enhances linear separability, enabling improved classification even with simple models such as logistic regression. Visualization of the residual embeddings further confirms the successful removal of linguistic information while preserving tone-related features.

linguistic content, speech, tone classification, (15 more...)

arXiv.org Artificial Intelligence

2502.19387

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.16)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)

Add feedback